{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "NKbPFu-GWFDV"
   },
   "source": [
    "# Google El Carro for Oracle Workloads\n",
    "\n",
    "> Google [El Carro Oracle Operator](https://github.com/GoogleCloudPlatform/elcarro-oracle-operator)\n",
    "offers a way to run Oracle databases in Kubernetes as a portable, open source,\n",
    "community driven, no vendor lock-in container orchestration system. El Carro\n",
    "provides a powerful declarative API for comprehensive and consistent\n",
    "configuration and deployment as well as for real-time operations and\n",
    "monitoring.\n",
    "Extend your Oracle database's capabilities to build AI-powered experiences\n",
    "by leveraging the El Carro Langchain integration.\n",
    "\n",
    "This guide goes over how to use El Carro Langchain integration to\n",
    "[save, load and delete langchain documents](/docs/how_to#document-loaders)\n",
    "with `ElCarroLoader` and `ElCarroDocumentSaver`. This integration works for any Oracle database, regardless of where it is running.\n",
    "\n",
    "Learn more about the package on [GitHub](https://github.com/googleapis/langchain-google-el-carro-python/).\n",
    "\n",
    "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/googleapis/langchain-google-el-carro-python/blob/main/docs/document_loader.ipynb)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "ZqONzXRcWMJg"
   },
   "source": [
    "## Before You Begin\n",
    "\n",
    "Please complete\n",
    "the [Getting Started](https://github.com/googleapis/langchain-google-el-carro-python/tree/main/README.md#getting-started)\n",
    "section of\n",
    "the README to set up your El Carro Oracle database."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "imbbHxKfWPso"
   },
   "source": [
    "### 🦜🔗 Library Installation\n",
    "\n",
    "The integration lives in its own `langchain-google-el-carro` package, so\n",
    "we need to install it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "Su5BMP2zWRwM"
   },
   "outputs": [],
   "source": [
    "%pip install --upgrade --quiet langchain-google-el-carro"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "azV0k45WWSVI"
   },
   "source": [
    "## Basic Usage\n",
    "\n",
    "### Set Up Oracle Database Connection\n",
    "Fill out the following variable with your Oracle database connections details."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# @title Set Your Values Here { display-mode: \"form\" }\n",
    "HOST = \"127.0.0.1\"  # @param {type: \"string\"}\n",
    "PORT = 3307  # @param {type: \"integer\"}\n",
    "DATABASE = \"my-database\"  # @param {type: \"string\"}\n",
    "TABLE_NAME = \"message_store\"  # @param {type: \"string\"}\n",
    "USER = \"my-user\"  # @param {type: \"string\"}\n",
    "PASSWORD = input(\"Please provide a password to be used for the database user: \")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "\n",
    "If you are using El Carro, you can find the hostname and port values in the\n",
    "status of the El Carro Kubernetes instance.\n",
    "Use the user password you created for your PDB.\n",
    "\n",
    "Example Ouput:\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "```\n",
    "kubectl get -w instances.oracle.db.anthosapis.com -n db\n",
    "NAME   DB ENGINE   VERSION   EDITION      ENDPOINT      URL                DB NAMES   BACKUP ID   READYSTATUS   READYREASON        DBREADYSTATUS   DBREADYREASON\n",
    "\n",
    "mydb   Oracle      18c       Express      mydb-svc.db   34.71.69.25:6021   ['pdbname']            TRUE          CreateComplete     True            CreateComplete\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "### ElCarroEngine Connection Pool\n",
    "\n",
    "`ElCarroEngine` configures a connection pool to your Oracle database, enabling successful connections from your application and following industry best practices."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "xG1mYFkEWbkp"
   },
   "outputs": [],
   "source": [
    "from langchain_google_el_carro import ElCarroEngine\n",
    "\n",
    "elcarro_engine = ElCarroEngine.from_instance(\n",
    "    db_host=HOST,\n",
    "    db_port=PORT,\n",
    "    db_name=DATABASE,\n",
    "    db_user=USER,\n",
    "    db_password=PASSWORD,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "ICW3k_qUWgyv"
   },
   "source": [
    "### Initialize a table\n",
    "\n",
    "Initialize a table of default schema\n",
    "via `elcarro_engine.init_document_table(<table_name>)`. Table Columns:\n",
    "\n",
    "- page_content (type: text)\n",
    "- langchain_metadata (type: JSON)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "JmlGLukoWdfS"
   },
   "outputs": [],
   "source": [
    "elcarro_engine.drop_document_table(TABLE_NAME)\n",
    "elcarro_engine.init_document_table(\n",
    "    table_name=TABLE_NAME,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "kaI3avj5Wn5O"
   },
   "source": [
    "### Save documents\n",
    "\n",
    "Save langchain documents with `ElCarroDocumentSaver.add_documents(<documents>)`.\n",
    "To initialize `ElCarroDocumentSaver` class you need to provide 2 things:\n",
    "\n",
    "1. `elcarro_engine` - An instance of a `ElCarroEngine` engine.\n",
    "2. `table_name` - The name of the table within the Oracle database to store\n",
    "   langchain documents."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "skaXpthSWpeg"
   },
   "outputs": [],
   "source": [
    "from langchain_core.documents import Document\n",
    "from langchain_google_el_carro import ElCarroDocumentSaver\n",
    "\n",
    "doc = Document(\n",
    "    page_content=\"Banana\",\n",
    "    metadata={\"type\": \"fruit\", \"weight\": 100, \"organic\": 1},\n",
    ")\n",
    "\n",
    "saver = ElCarroDocumentSaver(\n",
    "    elcarro_engine=elcarro_engine,\n",
    "    table_name=TABLE_NAME,\n",
    ")\n",
    "saver.add_documents([doc])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "owTYQdNyWs9s"
   },
   "source": [
    "### Load documents\n",
    "\n",
    "Load langchain documents with `ElCarroLoader.load()`\n",
    "or `ElCarroLoader.lazy_load()`.\n",
    "`lazy_load` returns a generator that only queries database during the iteration.\n",
    "To initialize `ElCarroLoader` class you need to provide:\n",
    "\n",
    "1. `elcarro_engine` - An instance of a `ElCarroEngine` engine.\n",
    "2. `table_name` - The name of the table within the Oracle database to store\n",
    "   langchain documents.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "CM6p11amWvYp"
   },
   "outputs": [],
   "source": [
    "from langchain_google_el_carro import ElCarroLoader\n",
    "\n",
    "loader = ElCarroLoader(elcarro_engine=elcarro_engine, table_name=TABLE_NAME)\n",
    "docs = loader.lazy_load()\n",
    "for doc in docs:\n",
    "    print(\"Loaded documents:\", doc)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "OTIDGiZ8WyS3"
   },
   "source": [
    "### Load documents via query\n",
    "\n",
    "Other than loading documents from a table, we can also choose to load documents\n",
    "from a view generated from a SQL query. For example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "p3OB9AwgWzrq"
   },
   "outputs": [],
   "source": [
    "from langchain_google_el_carro import ElCarroLoader\n",
    "\n",
    "loader = ElCarroLoader(\n",
    "    elcarro_engine=elcarro_engine,\n",
    "    query=f\"SELECT * FROM {TABLE_NAME} WHERE json_value(langchain_metadata, '$.organic') = '1'\",\n",
    ")\n",
    "onedoc = loader.load()\n",
    "print(onedoc)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "E6Fl7YNvW3Ep"
   },
   "source": [
    "The view generated from SQL query can have different schema than default table.\n",
    "In such cases, the behavior of ElCarroLoader is the same as loading from table\n",
    "with non-default schema. Please refer to\n",
    "section [Load documents with customized document page content & metadata](#load-documents-with-customized-document-page-content--metadata)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "QgsP78MhW4wc"
   },
   "source": [
    "### Delete documents\n",
    "\n",
    "Delete a list of langchain documents from an Oracle table\n",
    "with `ElCarroDocumentSaver.delete(<documents>)`.\n",
    "\n",
    "For a table with a default schema (page_content, langchain_metadata), the\n",
    "deletion criteria is:\n",
    "\n",
    "A `row` should be deleted if there exists a `document` in the list, such that\n",
    "\n",
    "- `document.page_content` equals `row[page_content]`\n",
    "- `document.metadata` equals `row[langchain_metadata]`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "QSYRHGHXW6IN"
   },
   "outputs": [],
   "source": [
    "docs = loader.load()\n",
    "print(\"Documents before delete:\", docs)\n",
    "saver.delete(onedoc)\n",
    "print(\"Documents after delete:\", loader.load())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "RerPkBRAW8yR"
   },
   "source": [
    "## Advanced Usage\n",
    "\n",
    "### Load documents with customized document page content & metadata\n",
    "\n",
    "First we prepare an example table with non-default schema, and populate it with\n",
    "some arbitrary data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "u0Fd46aqW-8k"
   },
   "outputs": [],
   "source": [
    "import sqlalchemy\n",
    "\n",
    "create_table_query = f\"\"\"CREATE TABLE {TABLE_NAME} (\n",
    "    fruit_id NUMBER GENERATED BY DEFAULT AS IDENTITY (START WITH 1),\n",
    "    fruit_name VARCHAR2(100) NOT NULL,\n",
    "    variety VARCHAR2(50),\n",
    "    quantity_in_stock NUMBER(10) NOT NULL,\n",
    "    price_per_unit NUMBER(6,2) NOT NULL,\n",
    "    organic NUMBER(3) NOT NULL\n",
    ")\"\"\"\n",
    "elcarro_engine.drop_document_table(TABLE_NAME)\n",
    "\n",
    "with elcarro_engine.connect() as conn:\n",
    "    conn.execute(sqlalchemy.text(create_table_query))\n",
    "    conn.commit()\n",
    "    conn.execute(\n",
    "        sqlalchemy.text(\n",
    "            f\"\"\"\n",
    "            INSERT INTO {TABLE_NAME} (fruit_name, variety, quantity_in_stock, price_per_unit, organic)\n",
    "            VALUES ('Apple', 'Granny Smith', 150, 0.99, 1)\n",
    "            \"\"\"\n",
    "        )\n",
    "    )\n",
    "    conn.execute(\n",
    "        sqlalchemy.text(\n",
    "            f\"\"\"\n",
    "            INSERT INTO {TABLE_NAME} (fruit_name, variety, quantity_in_stock, price_per_unit, organic)\n",
    "            VALUES ('Banana', 'Cavendish', 200, 0.59, 0)\n",
    "            \"\"\"\n",
    "        )\n",
    "    )\n",
    "    conn.execute(\n",
    "        sqlalchemy.text(\n",
    "            f\"\"\"\n",
    "            INSERT INTO {TABLE_NAME} (fruit_name, variety, quantity_in_stock, price_per_unit, organic)\n",
    "            VALUES ('Orange', 'Navel', 80, 1.29, 1)\n",
    "            \"\"\"\n",
    "        )\n",
    "    )\n",
    "    conn.commit()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "hGPYiTu7XBh3"
   },
   "source": [
    "If we still load langchain documents with default parameters of `ElCarroLoader`\n",
    "from this example table, the `page_content` of loaded documents will be the\n",
    "first column of the table, and `metadata` will be consisting of key-value pairs\n",
    "of all the other columns."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "eQbRapM_XC1S"
   },
   "outputs": [],
   "source": [
    "loader = ElCarroLoader(\n",
    "    elcarro_engine=elcarro_engine,\n",
    "    table_name=TABLE_NAME,\n",
    ")\n",
    "loaded_docs = loader.load()\n",
    "print(f\"Loaded Documents: [{loaded_docs}]\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "tOH6i2jWXFqz"
   },
   "source": [
    "We can specify the content and metadata we want to load by setting\n",
    "the `content_columns` and `metadata_columns` when initializing\n",
    "the `ElCarroLoader`.\n",
    "\n",
    "1. `content_columns`: The columns to write into the `page_content` of the\n",
    "   document.\n",
    "2. `metadata_columns`: The columns to write into the `metadata` of the document.\n",
    "\n",
    "For example here, the values of columns in `content_columns` will be joined\n",
    "together into a space-separated string, as `page_content` of loaded documents,\n",
    "and `metadata` of loaded documents will only contain key-value pairs of columns\n",
    "specified in `metadata_columns`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "9gCFWqgGXHD3"
   },
   "outputs": [],
   "source": [
    "loader = ElCarroLoader(\n",
    "    elcarro_engine=elcarro_engine,\n",
    "    table_name=TABLE_NAME,\n",
    "    content_columns=[\n",
    "        \"variety\",\n",
    "        \"quantity_in_stock\",\n",
    "        \"price_per_unit\",\n",
    "        \"organic\",\n",
    "    ],\n",
    "    metadata_columns=[\"fruit_id\", \"fruit_name\"],\n",
    ")\n",
    "loaded_docs = loader.load()\n",
    "print(f\"Loaded Documents: [{loaded_docs}]\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "4KlSfvPJXKgM"
   },
   "source": [
    "### Save document with customized page content & metadata\n",
    "\n",
    "In order to save langchain document into table with customized metadata fields\n",
    "we need first create such a table via `ElCarroEngine.init_document_table()`, and\n",
    "specify the list of `metadata_columns` we want it to have. In this example, the\n",
    "created table will have table columns:\n",
    "\n",
    "- content (type: text): for storing fruit description.\n",
    "- type (type VARCHAR2(200)): for storing fruit type.\n",
    "- weight (type INT): for storing fruit weight.\n",
    "- extra_json_metadata (type: JSON): for storing other metadata information of the\n",
    "  fruit.\n",
    "\n",
    "We can use the following parameters\n",
    "with `elcarro_engine.init_document_table()` to create the table:\n",
    "\n",
    "1. `table_name`: The name of the table within the Oracle database to store\n",
    "   langchain documents.\n",
    "2. `metadata_columns`: A list of `sqlalchemy.Column` indicating the list of\n",
    "   metadata columns we need.\n",
    "3. `content_column`: column name to store `page_content` of langchain\n",
    "   document. Default: `\"page_content\", \"VARCHAR2(4000)\"`\n",
    "4. `metadata_json_column`: column name to store extra\n",
    "   JSON `metadata` of langchain document.\n",
    "   Default: `\"langchain_metadata\", \"VARCHAR2(4000)\"`.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "1Wqs05gpXMW9"
   },
   "outputs": [],
   "source": [
    "elcarro_engine.drop_document_table(TABLE_NAME)\n",
    "elcarro_engine.init_document_table(\n",
    "    table_name=TABLE_NAME,\n",
    "    metadata_columns=[\n",
    "        sqlalchemy.Column(\"type\", sqlalchemy.dialects.oracle.VARCHAR2(200)),\n",
    "        sqlalchemy.Column(\"weight\", sqlalchemy.INT),\n",
    "    ],\n",
    "    content_column=\"content\",\n",
    "    metadata_json_column=\"extra_json_metadata\",\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "bVEWHYU-XPFt"
   },
   "source": [
    "Save documents with `ElCarroDocumentSaver.add_documents(<documents>)`. As you\n",
    "can see in this example,\n",
    "\n",
    "- `document.page_content` will be saved into `content` column.\n",
    "- `document.metadata.type` will be saved into `type` column.\n",
    "- `document.metadata.weight` will be saved into `weight` column.\n",
    "- `document.metadata.organic` will be saved into `extra_json_metadata` column in\n",
    "  JSON format.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "Iy4wRZLPXQn5"
   },
   "outputs": [],
   "source": [
    "doc = Document(\n",
    "    page_content=\"Banana\",\n",
    "    metadata={\"type\": \"fruit\", \"weight\": 100, \"organic\": 1},\n",
    ")\n",
    "\n",
    "print(f\"Original Document: [{doc}]\")\n",
    "\n",
    "saver = ElCarroDocumentSaver(\n",
    "    elcarro_engine=elcarro_engine,\n",
    "    table_name=TABLE_NAME,\n",
    "    content_column=\"content\",\n",
    "    metadata_json_column=\"extra_json_metadata\",\n",
    ")\n",
    "saver.add_documents([doc])\n",
    "\n",
    "loader = ElCarroLoader(\n",
    "    elcarro_engine=elcarro_engine,\n",
    "    table_name=TABLE_NAME,\n",
    "    content_columns=[\"content\"],\n",
    "    metadata_columns=[\n",
    "        \"type\",\n",
    "        \"weight\",\n",
    "    ],\n",
    "    metadata_json_column=\"extra_json_metadata\",\n",
    ")\n",
    "\n",
    "loaded_docs = loader.load()\n",
    "print(f\"Loaded Document: [{loaded_docs[0]}]\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "x0vkL7PKXUmU"
   },
   "source": [
    "### Delete documents with customized page content & metadata\n",
    "\n",
    "We can also delete documents from table with customized metadata columns\n",
    "via `ElCarroDocumentSaver.delete(<documents>)`. The deletion criteria is:\n",
    "\n",
    "A `row` should be deleted if there exists a `document` in the list, such that\n",
    "\n",
    "- `document.page_content` equals `row[page_content]`\n",
    "- For every metadata field `k` in `document.metadata`\n",
    "    - `document.metadata[k]` equals `row[k]` or `document.metadata[k]`\n",
    "      equals `row[langchain_metadata][k]`\n",
    "- There is no extra metadata field present in `row` but not\n",
    "  in `document.metadata`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "OcJPeCuKXWSa"
   },
   "outputs": [],
   "source": [
    "loader = ElCarroLoader(elcarro_engine=elcarro_engine, table_name=TABLE_NAME)\n",
    "saver.delete(loader.load())\n",
    "print(f\"Documents left: {len(loader.load())}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "S4SxUoY-XsPN"
   },
   "source": [
    "## More examples\n",
    "\n",
    "Please look\n",
    "at [demo_doc_loader_basic.py](https://github.com/googleapis/langchain-google-el-carro-python/tree/main/samples/demo_doc_loader_basic.py)\n",
    "and [demo_doc_loader_advanced.py](https://github.com/googleapis/langchain-google-el-carro-python/tree/main/samples/demo_doc_loader_advanced.py)\n",
    "for\n",
    "complete code examples.\n"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3",
   "name": "python3"
  },
  "language_info": {
   "name": "python"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
